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ABSTRACT 


We present results in three different arenas of discrete mathematics. 


Let La(n, H) denote the cardinality of the largest family on the Boolean lattice 


that does not contain H as a subposet. Denote 7(H) := limy+.. Coy A crown 


[n/2] 
Oo, for k > 2 is a poset on 2 levels whose Hasse diagram is a cycle. Griggs and Lu 


(2009) showed 7(O4,) = 1 for k > 2. Lu (2014) proved 7(O2,) = 1 for odd k > 7. We 
prove that the maximum size of a Og¢-free family, when restricted to the middle two 
levels of B,, is no greater than 1:56 ( 7, i). This section is joint work with Linyuan 
Lu. 

The diffusion state distance (DSD) was introduced by Cao-Zhang-Park-Daniels- 
Crovella-Cowen-Hescott (2013) to capture functional similarity in protein-protein in- 
teraction networks. They proved the convergence of DSD for non-bipartite graphs. 
We extend the DSD to bipartite graphs using lazy-random walks and consider the gen- 
eral L,-version of DSD. We discovered the connection between the DSD L,-distance 
and Green’s function, which was studied by Chung and Yau (2000). Based on that, 
we computed the DSD L,-distance for Paths, Cycles, Hypercubes, as well as random 
graphs G(n,p) and G(w,..., Wn). We also examined the DSD distances of two bio- 
logical networks. This section is joint work with Peter Chin, Linyuan Lu, and Amit 
Sinha. 

Motivated by the recent work on the Turdn problems on non-uniform hyper- 
graphs, we study when a fixed non-uniform hypergraph H occurs in random hyper- 
graphs with high probability. To be more precise, for a given set of positive integers 


R := {k,,k2,...,k,} and probabilities p = (p1,p2,-..,Pr) € [0,1]", let G2(n, p) be 


ili 


the random hypergraph G on n vertices so that for 1 <7 < r each k;-subset of vertices 
appears as an edge of G with probability p; independently. We ask for what proba- 
bility vector p, G®(n, p) almost surely contains a given subhypergraph H. Note that 
the Erdés-Rényi model G(n,p) is the special case of G@(n,p) with R = {2}. The 
question of the threshold of the occurence of a fixed graph H in G(n, p) is well-studied 
in the literature. We generalize these results to non-uniform hypergraphs. Surpris- 
ingly, those p for which G*@(n,p) almost surely contains H, form a convex region 
(depending on #7) in a log-scale drawing. We also consider the associated extension 


problems. This section is joint work with Linyuan Lu. 
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CHAPTER 1 


INTRODUCTION 


A graph G = (V, F) is a set V of vertices together with a set E of pairs of distinct 
vertices called edges. A hypergraph H = (V, EF) is a set V of vertices together with 
an edge set EF containing nonempty subsets of V. We say H is r-uniform if every 
edge e € E(H) contains exactly r distinct vertices. If r = 2, we just refer to it asa 
graph, or simple graph. In contrast, a non-uniform hypergraph has edges of varying 


vertex cardinalities. 


1.1 POSETS 


A partially ordered set G = (S,<), or poset for short, is a set S with a partial ordering 
<. G contains another poset H = ($",<’) as a subposet if there exists an injective 
map f : S’ > S such that for all u,v € S’, if u <’ uv then f(u) < f(v). Posets can 
be represented with a Hasse diagram, a graph whose vertices are the sets, and edges 
connect pairs of comparable sets, and we suppress any edges implied by transitivity. 

Let [n] := {0,1,...,2—1}. The poset of concern here is the Boolean lattice, is 
defined By, := (2!"!,C) for n € N. Any family F C 2!) will be considered a subposet 
of b,,. For any poset H, we say F is H-free if H is not a subposet of F. For any 
n EN, La(n, #) is defined to be the cardinality of the largest family F C B,, that is 
H-free. We may consider La(n, H) as the poset analog of ex(n, H) in Turan theory, 
which stands for the maximum number of edges possible on a graph on n vertices 
that does not contain H as a subgraph. Figure 1.1 shows the Hasse diagram of the 


Boolean lattice for n = 2. 


{0} {1} 


Figure 1.1 The Hasse diagram for By = (2P!, C), also known as the diamond. 


The history of determining La(n, H) dates back to Sperner [68] in 1928, when he 
proved that La(n, H) = (5) if H is a pair of comparable elements (P;). That is, the 
maximum size of an antichain, which means no two elements of H are comparable, 
is G). In particular, the bound is attained by taking the middle row of B,, which 
is the row with the most elements. If n is odd, then either of bela) and (a) will 
do. For convenience, the asymptotic value of La(n, H) is abbreviated as 
La(n, H) 

me (pai) 
Griggs and Lu [44] conjectured that 7(H) exists and is an integer for all finite posets 
H. In Table 1.1, we summarize some of the posets for which 7(H) has already been 
determined. 


Table 1.1 Known values of 7(#) in literature. 


H Name m7(H) Reference 
Ay C::-C A, chain or Ph, k—-1 (34 
A C B; for i € [r] r-fork or V, 1 28 
AcC,Dand BCC = Ne 1 42 
AUBCCND butterfly 2 29 
Aivim AGE Dich Be Ke 2 50 
A, C:---CA,C By... B; P,(s) k 28 


A chain of length k is a sequence of sets Ay C Ap C --: C Ag. The height of a 
poset is the length of its longest chain. Bukh [11] proved that for any poset H whose 
Hasse diagram is a tree with length k, that 7(H) =k —1. 

While La(n, H) has been determined for some simple posets, there is still much 


work to be done. Improving bounds on La(n, H) is still an active endeavor. The 


diamond (a copy of Bz, see Figure 1.1) for instance, has been especially scrutinized in 


recent years. Table 1.2 summarizes the improvements on its bound in recent years. 


Table 1.2 The upper bound for 7(B2), provided the limit exists. 


Bound on 


Authors Reference 
T (Ba) 

Axenovich, Manske, and Martin 2.283 [3] 

Griggs, Li, and Lu 2.273 [43] 

Kramer, Martin, and Young 2.26 [57] 

Grosz, Methuku, and Tompkins 2.207 [45] 


We can also consider problems involving induced posets. We say G contains H 
as an induced subposet if for any u,v € S", u <’ v if and only if f(u) < f(v). Then, 
La*(n, H) represents the cardinality of the largest family that does not contain H as 


an induced subposet. Also, we can define 


Once can see that La(n,H) < La*(n,H) and a(H) < 2*(#) for any given H. In 
general, determining La*(n, H) is a much more difficult task than La(n,H) for a 
given poset H, and consequently there are not many known values of 7*(H) found 
in literature. Carroll and Katona [13] showed 7*(V2) = 1; Boehnlein and Jiang [9] 
proved that for H whose Hasse diagram is a tree of height k, that 1*(H) =k -—1. 

Of particular interest to us, however, are crowns. A crown, notated Oo, for k > 2, 
is a poset with height 2 whose Hasse diagram is a cycle. Often ©, is known as the 
butterfly. Figure 1.1 shows an example of a crown, namely the Hasse diagram of the 
6-crown, which happens to be the middle two levels of Bs. 

Griggs and Lu [44] showed that for k > 2, 7(O4,) = 1, and bounded m(O4z_-2) < 
1.707. Later, Lu [59] extended this, proving that for odd k > 7, 7(O2,) = 1. This 
leaves only Og and Ojo as those crowns Oo, for which 7(Q ;) remains unknown. In 


chapter 2, we will present an upper bound for 7(Qg), that carries an extra stipulation. 


Figure 1.2. The 6-crown, Og, or the middle two rows of B3. 


An important tool used in proving many of the above results is the Lubell function, 


which is defined 
1 


hig) = S- (2) 


FEF |F| 


In practice, the Lubell function is the expected number of elements from ¥ that fall 
on a random full chain (one that contains an element of each cardinality, so having 
height n+ 1) in B,. The use of this function dates back to Lubell’s [61] proof of 
Sperner’s Theorem. Aside from arguments using the Lubell function, an increasingly 
popular tool is with flag algebras, the method we will discuss in detail in chapter 2, 
and use to prove our main result. 


We exploit some linear algebra in the course of the flag algebra method. A real, 


symmetric n by n matrix A is positive semidefinite if for any x € R”, 


xg! Ax > 0. 


An equivalent condition is that the eigenvalues of A are all non-negative. If a,b > 0, 
and symmetric n by n matrix B is also positive semidefinite, then aA+bB is positive 


semidefinite as well. 


1.2. RANDOM WALKS ON GRAPHS 


A walk of length k is an alternating sequence of vertices and edges v9, €1, V1,.- +5 €k; Uk: 
We permit vertices and edges to be repeated. Let d, denote the degree, or number 
of neighbors, of a vertex v € V(G). A random walk of length k begins at a specified 


vertex Up, and for i € [k] is traversed by randomly choosing a neighbor of u;_; (each 


having probability 1/d,,_, of being chosen, to be the next vertex v; in the sequence. 
Fixing a € (0,1), an a-lazy random walk is walk in which we remain at our present 
vertex uv with probability a, or decide to step to one of its neighboring vertices with 
probability (1 — a)/d,. 

A path is a walk where no vertex is repeated. While there is a plethora of appli- 
cations for walks and paths in graph theory, here we will only concern ourselves with 
their use in comparing the structures of graphs. A naive way of comparing networks 
is by looking at the distribution of the lengths of the shortest paths between all vertex 
pairs. However, Cao et al [12] noted that in protein-protein interaction (PPI) net- 
works, most networks have small diameters; that is, most vertices are relatively close 
to all other vertices. Their solution was to employ a new metric, called the diffusion 
state difference, which is based on random walks. A drawback is that their result 
excludes bipartite graphs. In chapter 3, we extend the diffusion state difference to 
cover all graphs using a-lazy random walks instead. 

In order extend the result of Cao [12], our methods centered around Green’s 
function. For a graph G, let A = (a;;) denote its adjacency matrix, where a;; := 1 if 
viv; € E(G), and a;; := 0 otherwise. Define D = (d;;) to be the diagonal matrix such 
that dj = dy, and d;; := 0 for i 4 7. Also, let J denote the n by n identity matrix 


where |V(G)| = n. Then, the discrete Laplacian is defined 
L:=I-D'A. 
But L is not symmetric. So to achieve symmetry, the normalized Laplacian is defined 
LesT sD PAD. 


Note that L and £ have the same eigenvalues. For a full overview of spectral graph 
theory refer to [15]. Then, Green’s function is defined to be the left inverse of L. 
Green’s function first appeared in 1828, in a paper by George Green [41] using 


partial differential equations for applications to electricity and magnetism. William 


Thomson (Lord Kelvin) [70, 71] revisited Green’s functions years later, bringing them 
more attention. Later, Chung and Yau [22] thoroughly explored the application of 


Green’s function to graphs. 


1.3. HyPERGRAPHS 


The questions of the average behavior and the extremal behavior are frequently asked 
for many discrete objects. They are often the motivations for the growth of the 
discrete areas. 

For a hypergraph H = (V,£), define the edge type of H, R(H) := {|F|: F € 
E(H)}. For a fixed set R of positive integers, we say a hypergraph H is an R-graph 
if R(H) C R. We often denote by H®, an R-graph on n vertices. 

The extremal problems of non-uniform hypergraphs are considered by Johnston 
and Lu [49]. They generalized several important properties of the Turan density to 
non-uniform hypergraphs: supersaturation, blow-up, suspension, etc. For a given 
R-graph H, the Turan density 7(H) is the smallest number a such that for any 
€ > 0 and any R-graph G on n vertices with Lubell value h,(G) := Vreaa) () 
of at least a + € must contain a copy of H for sufficiently large n. This definition 
generalizes the classical definition of Turan density of k-uniform hypergraphs. For 
R= {2}, the graph case, Erdés-Stone-Simonovits proved 7(G) = 1 — (CEST for any 
graph G with chromatic number y(G) > 3. Johnston and Lu generalized Erdés- 
Stone-Simonovits’ theorem to {1,2}-graphs and determined the value of 7(H) for 
all {1,2}-graph H. There are a few uniform hypergraphs whose Turdn density has 
been determined: the Fano plane [40, 52], expanded triangles [53], 3-books, 4-books 
[39], Fs [37], extended complete graphs [62], etc. In particular, Baber and Talbot [4] 
recently found the Turan density of many 3-uniform hypergraphs using flag algebra 
methods. However, no single value of 7(A7,) is known for any complete r-graph on 


k-vertices with k > r > 3. Turdn conjectured [72] that 7(K?) = 5/9. Erdés [33] 


offered $500 for determining any 7(K;) with k > r > 3 and $1000 for answering 
it for all k and r. The upper bounds for 7(A}) have been sequentially improved: 
0.6213 (de Caen [30]), 0.5936 (Chung-Lu [21]), 0.56167 (Razborov [63], using the flag 
algebra method.) For a more complete survey of methods and results on uniform 
hypergraphs see Peter Keevash’s survey paper [51]. 

The question of average behavior asks when a fixed hypegraph H will occur in 
a random hypergraph. For any fixed set R of positive integers, and any probability 
vector p € [0, 1)”, we define the random hypergraph G"(n, p) = (V, E) with V := [n], 
the set of first n positive integers; and for r € R, an r-set F € ce) belongs to 
E independently with probabilty p,. Additionally, we write the probability that 


G®(n,p) satisfies a certain property A as Pr[G?(n,p) —& A]. For R = {2}, this 


definition is precisely the Erdés-Rényi model G*(n, p) of the classical random graphs, 
originally described in [35]. In recent years, the same concept has been generalized 
to apply to uniform hypergraphs with R = {r}, such as in [54], [24], and [31]. A 
graph H on n vertices with e edges is called balanced if for every subgraph H' Cc H, 
then p(H"') < p(H), where p(H) = ©. Even stronger, H is called strictly balanced if 
p(H') < p(#) for all proper subgraphs H’ ¢ H. Given a fixed graph H, the threshold 
of the occurence of a strictly balanced graph H in G'?!(n,p) is given by p = cn~°/¢ 


by Alon and Spencer [1]. In this case, for any c > 0, 


lim Pr(G(n,p) E A] = exp (—c®/|Aut(#))). 


CHAPTER 2 


6-CROWNS 


2.1 FLAG PRELIMINARIES 


Recall that h,(F) refers to the Lubell function. Let denote the family of all 6- 
crown-free posets consisting only of sets from the middle two rows of Bs. Define py to 
be the probability that a random subset of the middle two rows of Bs is isomporphic 


to H. One can observe that: 


hi(F) = Do Palin) (2.1) 
= max hn (H). (2:2) 


However, under specific circumstances, a careful selection of constants cy for each 


set H € H, we can potentially improve this bound as follows: 


Proposition 2.1. 


Pn(F) < S> (pahn(H) + copy) (2.3) 
HEH 
— Da (hol) + cH)PH (2.4) 


where the cy come from entries in a positive semidefinite matrix. 


The selection of the values of cy are far from arbitrary. We will rigorously justify 
Proposition 2.1 with a stronger claim in Proposition 2.3 in the next section. For now, 


we note that the bound in Proposition 2.1 leads to our main result. 


Theorem 2.2. For a 6-crown-free set family F C B,, whose sets are restricted to 


the middle two rows of Bn, 


IF < 156(),")) 


Properly choosing the values of cy to use is the most difficult task here. To do 


so, we employ the method of flag algebras, introduced in the next section. 


2.2 FLAG ALGEBRAS 


The chief technique we use in the proof of Theorem 2.2 is that of flag algebras. Flag 
algebras are a strategy that is presently in vogue in discrete mathematics, particularly 
graph theory, that boils down to shrewd application of the Cauchy-Schwarz inequality. 
The notoriety of flag algebras can be attributed to Razborov’s [64] seminal paper on 
the method, in which he demonstrated their use with an abundance of applications, 
in particular improving numerous results in Turan theory. Since then, flag algebras 
have become increasingly popular appearing in papers from Baber and Talbot [5] 
and Keevash [51] on hypergraph Turan theory, Balogh et al [7] on hypercubes with 
forbidden subgraphs, and Kramer, Martin, and Young [57] on diamond-free posets, 
just to name a few. 

We present the flag algebra strategy in the context of a more general, graph 
theoretic version of Proposition 2.1. First we introduce the appropriate, analogous 
notation. We follow the setup of Baber and Talbot [5], who used flag algebras to 
tackle hypergraphs. For a graph G on n vertices, the density of a graph G is given 
by 


This can be adapted to a k-regular hypergraph by amending the denominator to 
Gh, but we stick to the classic graph for simplicity. In the poset case we use the 


Lubell function instead. A graph G is said to be F’-free if there is no subgraph of 


G that is isomporphic of F’. Let H be the family of graphs on ¢ vertices that are 
F-free. Then for any graph H € H, and an F-free graph G, define p(H;G) to be 
the probability that the subgraph induced by a randomly chosen subset of @ vertices 
from G is isomorphic to H. (Note that we used simply py earlier instead of p(H; G) 


because the size of the host graph was fixed.) 


Proposition 2.3 (Baber and Talbot, [5]). 
d(G) < >) (d(H) + cu)p(H; G) + o(1). 
HEH 


In particular, 


d(G) < max(d(H) + cu). 


Proof. We reproduce their proof for completeness, and in particular, because it high- 


lights the flag algebra setup. From the definition of density, observe that 


a(@) = Yo dH)p(H;@). (2.6) 
HEH 
And so, 
d(G) < max d(f). (2.7) 


Note that (2.6) and (2.7) are the analogs of (2.1) and (2.2), respectively. However 
these bounds can be improved if we exploit the intersection of two subgraphs H, H’ € 
H, whereas the latter observation only captures information on disjoint subgraphs. 
Let @: [€] ~ V(#) be a bijection. Then we define a flag to be an ordered pair 
F, = (4,6). Further, if o is a flag, we say F) is a o-flag if F, is isomorphic to 0. Next 
we denote F° to be the set of all o-flags, up to isomorphism, of order m < (€+|a])/2. 
The upper bound on m is necessary for attaining subgraphs that intersect nontrivially, 
namely in |o| vertices. Next, denote © as the set of all injective maps 0 : ||o|] > V(G). 
Then if F € F% and @ € O, assign p(F’, 0; G) to the probability that a random subset 


V’ with |V’| =m and im(@) C V’ C V(G) induces a o-flag isomorphic to F’. 


10 


Now, if Fi, F, € F°, and 0 € O, we set p(y, Fy,0;G) to be the probability 
that given two randomly chosen m-subsets of V(G), namely V/ and Vj, such that 
im(@) C Vi and V/NV; = im(@), then the flags (G[V/], 0) and G([V;], 0) are isomorphic 


to Fy, and Fy, respectively. While it would be rather convenient if 
p( Fa, 9; G)p(f,, 9; G) = p( Fa, Fs, 9; G), (2.8) 


this is not always true because the vertices are chosen with replacement on the left 
side but without replacement on the right side of (2.8). However, this is not an issue 
if we assume V(G) to be sufficiently large, as Baber and Talbot proved in a a special 


case of a lemma from Razborov [64]: 


Lemma 2.4 (Baber and Talbot, [5]). For any Fy, Fy, € F°, and@ € O, 
P(Fa, 0; G)p(Fo, 0; G) = p(Fa, Fi, 0; G@) + 0(1). 


In particular, the o(1) term goes to zero as |\V(G)| > oo. 


Consequently, if 0 € © is chosen uniformly at random, 


Ka o|p(Fu, 9; G) pfs, 9; G)| => Ea o|p( Fu, &, 9; G)| +o(1). 


However, that if we let Oy denote the set of all injective functions 6 : ||o|| — V(A), 


then 


Kg e[p(Fa, Fs, 8; G)| = S- Kg e,|P(Fa, £, 9; G)|p( A; G). 
HEH 


Next suppose Q = (Q,») is a positive semi-definite matrix of dimension |F% |. Define 


the vector pg := (p(F,6;G): F € F%). And so, 


Ke [Ps QPo| = Sf cH(o, ™m, Q)p( A; G) oe o(1), 
HEH 


where 


cu(o,m,Q)= YS) qaEsco,[p( Fa, Fs, 9; H)). 
Fa Pye Fe, 
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Next, suppose 0; is a type, m; < $(€+ |oi|), and Q; is a positive semidefinite matrix 


of dimension |F7'|. Fix t € Z to be the number of choices for (o;,m;, Q;). Then, for 


any H € H define: 


t 


CH = S- Crd Qe: 


i=1 


But exploiting that each matrix Q; is positive semidefinite, we obtain: 


S> p(H;G) + o(1) > 0. 
HEH 


And so, 
d(G) < Yo (d(H) + en)p(H;G@) + o(1). 


HEH 
But since yey p(H;G) = 1, then it follows 


d(G) < max (d(H) + cH). 


Returning to the poset setting, by changing density to Lubell function, Proposition 


2.3 implies Proposition 2.1. 


2.3. PROOF OF THEOREM 2.2 


{O12} {051,38} {01:4} 40,23) {0,24} {084} {19:3} {1,04} (13.4) 403.4) 


{0,1} {0,2} {0,3} {0,4} {1,2} {1,3} {1,4} {2,3} {2,4} {3,4} 


Figure 2.1. The middle two levels of Bs. 


We identify H as the set of all posets that do not contain a 6-crown, and whose 
members are restricted to the middle two rows of Bs (shown in Figure 2.1). For 


convenience, we may refer to any H € H by the graph induced by its Hasse diagram. 
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To determine the flags, using the notation established with Proposition 2.3, we 
set G as the middle two levels of Bs;, and @ = 10. There are 8400 such graphs, 
up to isomorphism, found with computer assistance. We denote these graphs as 
Ay, Hy,..., Hg399, sorted in increasing order of their Lubell function. A few of these 


graphs appear in Figures 2.2 and 2.3. 


Remark 2.5. Among these 8400 graphs, the highest Lubell value is 1.6, which is 
attained only by Hg399. Hence, for a 6-crown free family F on the middle two rows, 


IF < LG (uet a) 


Now the families {{0,1}} and {{1,2}} are isomorphic, but neither of these are 
isomorphic to {{0,1,2}}, as we do care to differentiate between the upper and lower 
rows. The difference will also be exploited in the use of the duals of the graphs. 
The dual of H € H is an inverted copy of H. That is, rather than considering 
H ¢€ (2!", C), we find the dual of H by instead considering H a member of (2!"!, >). 
In Figure 2.2, graphs A792 is the dual of H79¢g, and vice versa. Also, Hg3g9 is self 
dual. 

Now we can simply use py rather than p(H;G) since G is fixed. Set also m = 4, 
so the flags are composed of copies of By. Finally, |o| = 1, as the flags only overlap 
in one element, namely {0}. The result is a total of 6 flags, shown in Figure 2.4. 
Vertices that are included are shaded black, hollow vertices are excluded. 

Using computer assistance, a multiplication table for all possible pairs of flags 
was produced. As an example, consider the product of p; and po. The multiplication 
process is shown in Figure 2.5. The element 0 is fixed in both flags, but we consider 
p2 on the elements 0,3, 4 rather than 0,1,2. Their product is the linear combination 
of all subgraphs of the middle two levels of B;, that contain the sets {0,1} (from 
pi) and {0,3,4} (from p2) but not {0,2}, {0,1,2}, {0,3}, nor {0,4}. Note that the 
element {0} can be disregarded at this point, as it does not lie in the middle 2 rows of 


Bs. The Hasse diagram for this product is given in Figure 2.5, where the 14 elements 
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{0,1,2}{0,1,3}{0,1,4}{0,2,3}{0,2,4}{0,3,4}{1,2,3}41,2,4}{1,3,4}{2,3,4} 
Cr. EG: o. do O. oO 


FAA7202 


eo @e@ 8 @ 
{0,1} {0,2} {0,3} {0,4} {1,2} {1,3} {1,4} {2,3} {2,4} {3,4} 


{0,1,2}{0,1,3}{0,1,4}{0,2,3}{0,2,4}{0,3,4}{1,2,3}41,2,4}41,3,4}{2,3,4} 


7968 


{0,1} 40,2) {0,3} {Oj4t: 412) 4T3h Ab. {23}. foa\ 13 at 
{0,1,2}{0,1,3}{0,1,4}{0,2,3}{0,2,4}{0,3,4}{1,2,3}41,2,4}41,3,4}{2,3,4} 
Ome © a @ ae © wee O° mee eS) 
A797) 
@ @ 
TOL OD) £08) AGsa es Sot fat Shay, AO pay Bat 


{0,1,2}{0,1,3}{0,1,4}{0,2,3}{0,2,4}{0,3,4}{1,2,3}41,2,4}41,3,4}{2,3,4} 


eo @ 
Fig294 
Or OO Oe Oe, Oe 


{0,1} {0,2} {0,3} {0,4} {1,2} {1,3} {1,4} {2,3} {2,4} {3,4} 


Figure 2.2. Some examples of the graphs H; 


that may or may not be included are shaded gray. Not counting isomorphisms, there 
are 2'* such graphs with this prescribed property. However, even before considering 
isomorphisms, note that not all of these 2!“ are realizable, as many will contain a 6- 


crown. However, the coefficients in this linear combination are the number of copies 


of each graph from H, counting isomorphisms. 


The resulting multiplication table has its columns indexed by the graphs, and its 


rows organized by pairwise products of flags. An abbreviated version of the table is 
provided in Table 2.1, transposed for formatting constraints. After determining the 


products of all pairs of flags (which are commutative, as noted earlier), we are able 
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{0,1,2}{0,1,3}{0,1,4}{0,2,3}{0,2,4}{0,3,4}{1,2,3}41,2,4}41,3,4}{2,3,4} 
7977 


O 
{O,1} {0,2} {O33}. {Oj4) {12} {13} {14} {23} 424) 43.4} 


{0,1,2}{0,1,3}{0,1,4}{0,2,3}{0,2,4}{0,3,4}{1,2,3}41,2,4}41,3,4}{2,3,4} 
O 
A951 


Or .© Cre? Mae ae. 
{0,1 {0,2) {0,3} 40,4): 41,2} {453} 414} {2,3} {2,4} {3,4} 


{0,1,2}{0,1,3}{0,1,4}{0,2,3}{0,2,4}{0,3,4}{1,2,3}41,2,4}41,3,4}{2,3,4} 
ges, 


{01} {0,2} {0,3} {0,4} {1,2}- 41.3} {14> {23} {2,4} 43,4} 


{0,1,2}{0,1,3}{0,1,4}{0,2,3}{0,2,4}{0,3,4}{1,2,3}41,2,4}41,3,4}{2,3,4} 
F398 


O> OF © on. iO 
{0,1} {0,2} {0,3} {0,4} {1,2} {1,3} {1,4} {2,3} {2,4} {3,4} 


{0,1,2}{0,1,3}{0,1,4}{0,2,3}{0,2,4}{0,3,4}{1,2,3}41,2,4}41,3,4}{2,3,4} 
F399 


Oo 20 
{0,1} {0,2} {0,3} {0,4} {1,2} {1,3} {1,4} {2,3} {2,4} {3,4} 


Figure 2.3. More examples of the graphs H; 


to set up a semidefinite program problem. The program has the form: 


minimize v 


subject to v > h,(H;) + cx, for all i € [8400] 


uv ER, Q is semidefinite. 
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© {0,1,2} © {0,1,2} @ {0,1,2} 
{0,1} © © {0,2} {0.1} @ © {0,2} {0,1} O © {0,2} 


{0}©) {3©) {93©) 
Po P1 P2 


© {0,1,2} {0,1,2} Pa 
{0,1} @ @ {0,2} {0,1} O {0,2} {0,1} {0,2} 


@) ©) O@) 
P3 Pa P5 


Figure 2.4 The flags. 


{0,1,2} {0,3,4} 


(0,1) © {0,1,2} (0,1) @ {0,1,2} © 
@ O x O Oo = @ © © 
{0,2} {0,2} {0,1} {0,2} {0,3} {0,4} 
«O Oe) 
P1 P2 {O) 


{0,1,2} {0,1,3} {0,1,4} {0,2,3} {0,2,4} {0,3,4} {1,2,3} {1,2,4} {1,3,4} {2,3,4} 


{0,1} {0,2} {0,3} {0,4} {1,2) {1,3} {1,4} {2,3} {2,4} {3.4} 


Figure 2.5 The product of flags p; and po. 


There are 6 flags, so a 6 by 6 semidefinite matrix will be computed, having the 
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Table 2.1. Number of copies of H;, obtained from combining flags p; and p, 
Pa; P2, P2, P2; P3, P3; P3, Pa, Pa, Ps, 
P2 P3 | Pa | P5 | P3 Pa | Ps | Pa | Ps | Ps 
A4395 | 120 | 0 0 0 120 | 0 0 0 0 0 
As904 | 120 | 0 0 0 120 | 0 0 0 0 0 
Az299 | 24 0 24 | 0 80 0 16 | 0 0 8 
A768 | 24 0 24 | 0 80 0 16 | 0 0 8 
A797, | 0 0 24 | 4 64 0 20 | 4 0 16 
A7977 | 0 12; 12/0 48 12 | 0 12 | 0 24 
Ago51 | O 12 | 12 | 0 48 12 | 0 12 | 0 24 
Ag994 | O 0 24 | 4 64 0 20 | 4 0 16 
A385 | O 0 0 20 | 40 0 20 | 20 | 0 40 
A398 | O 0 0 20 | 40 0 20 | 20 | 0 40 
F399 | 0 0 0 0 16 16 | 0 0 32 | 32 
form 
ii 9i2 913 Gia «415 16 
G21 922 G23 G24 G25 426 
- 931 932 933 G34 415 436 
G41 G42 943 G44 G45 G46 
951 952 953 954 455 456 
d61 962 63 Wor 65 66 


which is symmetric (qi; = qj) and positive semidefinite. 


multiplying the vector 


with the corresponding column (transposed to row here) H; in the multiplication 


(q11, 2412, 2G13, -- - 


Table 2.1. So the result has the form 


for some aj; € R. 


The program was then solved using the CSDP solver [10]. 


, G22; 2923, -: 


CH, = S- iG Digs 
1S1,j<6 


Le 


Now cy, is computed by 


2086; 66) 


Now from Proposition 2.1, 


hn(F) < max(hn(H) + cx). (2.9) 


~ HEH 


In order to drive the bound lower, the entire process can be repeated using the duals 
of the graphs instead. Figure 2.6 shows how the flags for the dual case are simply 


those in Figure 2.4 turned upside down, rooted in the (5) level rather than the ) 


level. 


Figure 2.6 A comparison of the flags for the original (left) and dual (right) cases 


That is, we can consider the flags being turned upside down, being rooted in the 


({) level of Bs rather than the (") level. As a result, we obtain a new inequality, 


hin(F) < max(hn(H) + ey), (2.10) 


where c},. Next, we can average the constants in (2.9) and (2.10) to obtain another 
bound, 
/ 
hn(F) < max (au a cues | (2.11) 


HEH 2 


The resulting matrix, after rounding the entries to a reasonably close rational 
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number, was 


—5 


wh 
Ot 


0 0 
6 9 
=i 


0 
3 


—6 


0 5 
0 —4 


With computer assistance, Qo is confirmed to be positive semidefinite. A summary 


of the results is given in Table 2.2, including only the graphs with a Lubell value of 


1.5 or 1.6. 


From (2.11), the bound of |F| < 1:56(;.5)) is attained. 


Table 2.2 Relevant results for graphs H with large Lubell value. 


H |h,(H)| “4% | h,(H) + “te 
Fesge |” 115 0.025 1.525 
Hease.| 21.5 | 001167 1.51167 
F3387 1.5 -0.02333 1.47667 
Aeggeg|| 11:5 0.01 1.51 
Haag | 1.5 -0.08 1.42 
Hesay.| 15 0 1.5 
F1g391 1.5 -0.05667 1.44333 
Hese5. || 15 0 15 
3393 1.5 -0.05667 1.44333 
Hesex.| “05 -0.08 1.42 
Heags | 1.5 0.01 1.51 
F396 1.5 -0.02333 1.47667 
Hezo7 | 1.5 | 0.01167 1.51167 
Hyg39g | 1.5 0.025 1.525 
F399 1.6 -0.04 1.56 


Remark 2.6. After finishing this proof, we discovered that a better bound has al- 


ready been proven. Restricting F to 2 rows, Kramer [56] showed |F| < (2V3 — 


2) se ) without the use of flag algebras. 
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2.4 FUTURE WORK 


Maintaining the restriction to the middle 2 levels of B,,, it seems plausible that the 
bound can be pushed further down. This could potentially be accomplished by using 
larger flags (namely B3), though this would be more computationally intensive. Short 
of gettting a bound for 7(Og) outright without any restrictions, it may be worthwhile 
to attempt a bound assuming a restriction to the middle 3 levels. Flag algebras seem 
to be the most promising method to use. 

It would also be interesting to see whether flag algebras could give us any nontrivial 
bounds on Ojo, even if it requires another restriction to the middle 2 levels. However, 
if we employed a similar setup to that presented here, the flags would consist of the 
middle 2 levels of 63, of which there are 20, up to isomorphism. Then the product of 
any 2 flags yields 2° possible graphs, before accounting for isomorphisms or checking 


for copies of Og. Thus, this appears to be more computationally daunting. 
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CHAPTER 3 
COMPUTING DIFFUSION STATE DISTANCE USING 


GREEN’S FUNCTION AND HEAT KERNEL ON GRAPHS! 


3.1 INTRODUCTION 


Recently, the diffusion state distance (DSD, for short) was introduced in [12] to cap- 
ture functional similarity in protein-protein interaction (PPI) networks. The diffusion 
state distance is much more effective than the classical shortest-path distance for the 
problem of transferring functional labels across nodes in PPI networks, based on evi- 
dence presented in [12]. The definition of DSD is purely graph theoretic and is based 
on random walks. 

Let G = (V,E) be a simple undirected graph on the vertex set {v1, v2,..-, Un}. 
For any two vertices u and v, let Het*!(u,v) be the expected number of times that 
a random walk starting at node u and proceeding for k steps, will visit node v. Let 
Het*t(u) be the vector (Het*!(u,v,),..., Het*!(u,u,)). The diffusion state distance 


(or DSD, for short) between two vertices u and v is defined as 
DSD(u,v) = jim | He (u) — Hel (v)| 
— oo 


provided the limit exists (see [12]). Here the Lj-norm is not essential. Generally, for 


q = 1, one can define the DSD L,-distance as 


DSD,(u,v) = lim | He (u) — He*(v)| 


k-oo qd 


'R. Boehnlein, P. Chin, A. Sinha, and L. Lu, Algorithms and Models for the Web Graph: 11th 
International Workshop, WAW 2014, Beijing, China, December 17-18, 2014, Proceedings, Springer 
International Publishing, 8882 (2014), 79-95. Reprinted here with permission of Springer. 
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provided the limit exists. (We use L, rather than L, to avoid confusion, as p will be 
used as a probability throughout the paper.) 

In [12], Cowen et al. showed that the above limit always exists whenever the 
random walk on G is ergodic (i.e., G is connected non-bipartite graph). They also 


prove that this distance can be computed by the following formula: 
DSD(u, v) = |\(u—1,)( - DA + W) I 


where D is the diagonal degree matrix, A is the adjacency matrix, and W is the 


constant matrix in which each row is a copy of 7 , 7 = a (d1,...,dn) is the 
unique steady state distribution. 

A natural question is how to define the diffusion state distance for a bipartite 
graph. We suggest to use the lazy random walk. For a given a € (0,1), one can 
choose to stay at the current node wu with probability a, and choose to move to one 
of its neighbors with probability (1 — a)/d,. In other words, the transitive matrix of 


the a-lazy random walk is 
T, =aI+(1—a)D"'A. 


Similarly, let Het*! (u,v) be the expected number of times that the a-lazy random 
walk starting at node u and proceeding for k steps, will visit node v. Let Het*!(u) be 
the vector (Het}(u,v1),..., Het (u, un)). The a-diffusion state distance L,-distance 


between two vertices u and v is 
DSD¢¥(u, v) = lim | ek (u) — He (v)| 


k- oo q 


Theorem 3.1. For any connected graph G and a € (0,1), the DSDG(u,v) ts always 


well-defined and satisfies 
DSD¢(u,v) = (L—a)~*||(Lu — 10)Glo. (3.1) 
Here G is the matrix of Green’s function of G. 


Ze 


Observe that (1—a)DSD¢#(u, v) is independent of the choice of a. Naturally, we 


define the DSD L,-distance of any graph G as: 
DSD,(u,v) = lim (1 — a)DSDF(u,v) = ||u - Le) GIlq- 


This definition extends the original definition for non-bipartite graphs. 

With properly chosen a, ||Het*}(u) — Het*}(v)||, converges faster than 
|| Het*}(u) — Het*}(v)||,. This fact leads to a faster algorithm to estimate a single 
distance DSD,(u,v) using random walks. We will discuss it in Remark 3.2. 

Green’s function was introduced in 1828 by George Green [41] to solve some partial 
differential equations, and it has found many applications (e.g. [6], [22],[16], [32], [47], 
[69]). 


The Green’s function on graphs was first investigated by Chung and Yau [22] 


in 2000. Given a graph G = (V,£) and a given function g: V > R, consider the 


problem to find f satisfying the discrete Laplace equation 
Lf = > (F (2) — f(y))pay = 9(2). 
yeV 
Here pzy is the transition probability of the random walk from x to y. Roughly 
speaking, Green’s function is the left inverse operator of LZ (for the graphs with 
boundary). It is closely related to the Heat kernel of the graphs (see also [27]) and 
the normalized Laplacian. 

In this paper, we will use Green’s function to compute the DSD L,-distance for 
various graphs. The maximum DSD L,-distance varies from graphs to graphs. The 
maximum DSD L,-distance for paths and cycles are at the order of @(n!*1/%) while 
the L,-distance for some random graphs G(n,p) and G(wi,...,wW,) are constant for 
some ranges of p. The hypercubes are somehow between the two classes. The DSD 
[,-distance is Q(n) while the L,-distance is O(1) for g > 1. Our method for random 


graphs is based on the strong concentration of the Laplacian eigenvalues. 
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The paper is organized as follows. In Section 2, we will briefly review the termi- 
nology on the Laplacian eigenvalues, Green’s Function, and heat kernel. The proof 
of Theorem 3.1 will be proved in Section 3. In Section 4, we apply Green’s function 
to calculate the DSD distance for various symmetric graphs like paths, cycles, and 
hypercubes. We will calculate the DSD L-distance for random graphs G(n,p) and 
G(w , W2,-.-,;Wn) in Section 5. In the last section, we examined two brain networks: 


a cat and a Rhesus monkey. The distributions of the DSD distances are calculated. 


3.2 NOTATION 


In this paper, we only consider undirected simple graph G = (V, F) with the vertex 
set V and the edge set E. For each vertex x € V, the neighborhood of x, denoted 
by N(a), is the set of vertices adjacent to x. The degree of x, denoted by d,, is the 
cardinality of N(a). We also denote the maximum degree by A and the minimum 
degree by 0. 

Without loss of generalization, we assume that the set of vertices is ordered 
and assume V = [n| = {1,2,...,n}. Let A be the adjacency matrix and D = 
diag(d,,...,d,) be the diagonal matrix of degrees. For a given subset S, let the vol- 
ume of S to be vol(S) := jeg d;. In particular, we write vol(G) = vol(V) = SL, di. 

Let V* be the linear space of all real functions on V. The discrete Laplace operator 


L: V* > V* is defined as 


The Laplace operator can also written as a (n x n)-matrix: 
L=I-D'A 


Here D~'A is the transition probability matrix of the (uniform) random walk on G. 


Note that LZ is not symmetric. We consider a symmetric version 


C=T-D PAD PSapr ip 


24 


which is so called the normalized Laplacian. Both L and £ have the same set of 


eigenvalues. The eigenvalues of £ can be listed as 


O= Xp <A < Ag S++ S Ap_y < 2. 


The eigenvalue A; > 0 if and only if G is connected while \,_; = 2 if and only if G is 

a bipartite graph. Let ¢o,¢1,...,@n—1 be a set of orthogonal unit eigenvectors. Here 
a 1 . eps . . boxe sn 

oo = Tray vo ...,Vdn) is the positive unit eigenvector for Ayo = 0 and ¢; is the 


eigenvector for \; (1 <i<n-—1). 


Let O = (¢0,---;@n-1) and A = diag(0,A1,.-.,An-1). Then O is an orthogonal 


matrix and L be diagonalized as 
£=0OK0’ (3.2) 
Equivalently, we have 
L= D-¥?OA0'DY?. (3.3) 


The Green’s function G is the matrix with its entries, indexed by vertices x and 


y, defined by a set of two equations: 


GL(2,y) = I(a,y) — (3.4) 


dy 
vol(G)’ 


Q 
_ 
| 

=) 


(3.5) 


(This is the so-called Green’s function for graphs without boundary in [22].) 


The normalized Green’s function G is defined similarly: 


/dedy 
GL(x, y) = T(z, y) as vol(G)’ 


The matrices G and G are related by 
G Bz. D'?PED- V2. 
Alternatively, G can be defined using the eigenvalues and eigenvectors of £ as follows: 
G=OACHO’, 
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where At-¥ = diag(0,Ay1,...,;1,). Thus, we have 
ee hd 
G(x, y) = 5 4/7 ¢1(2) bily). (3.6) 
l=1 AI d, 
For any real t > 0, the heat kernel 1; is defined as 
Hy = et. 
Thus, 
n—-1 
H(z, y) = S- e*"by(x) diy). 
l=0 
The heat kernel H; satisfies the heat equation 
ay f =—-LHF 
dt 1c ana ts 
The relation of the heat kernel and Green’s function is given by 
G = | Hud — ooo. 


The heat kernel can be used to compute Green’s function for the Cartesian product 
of two graphs. We will omit the details here. Readers are directed to [22] and [15] 


for the further information. 


3.3 PROOF OF THEOREM 3.1 


Proof. Rewrite the transition probability matrix T), as 


Ty =al+(1—a)D "A. 


= Dal + (1—a)D“? AD?) pV? 


= D-V(aI + (1—a)(I — £))DV? 


=D PF=1=e)L)D”. 


For k = 0,1,...,2—1, let 44 = 1-— (1 —a)A,y and A* = diag(\j,.--,A4_1) = 


I—(1—a)A. Applying Equation (3.3), we get 
T,= DOA*0' D1? af (O'DV/?)-1a*0'DY?, 
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Then for any t > 1, the t-step transition matrix is T’ = (OD/?)-1A"OD!/? = 


D-?0A*O'D/?, Denote p{!(u, j) as the (u, 7) entry in T%. 


= a 
pk? (u, 9) = Soar)’ Z br(u)oi(Z) 
1=0 u 
d; Rest ive hed 
= vol) 2 (Xi) q, Hu) ould) 
Thus, 
k n-1 
He (u, j) — He (v3) =o OI) ibd) (ae 7 di(w) — a5"? 4u(v)). 
t=0 [=1 


The limit limg.. Het*(u, 7) — Het*!(v, 7) forms the sum of n geometric series: 
oo > 
DY VIG oA) (ao orl) — do? oi(v)). 


t=0 d| 


Note each geometric series converges since the common ratio Aj € (—1,1). Thus, 


co n-l 
lim (Hel (u, j) — Hel (v, j)) = 2 Oi) aj au(A) (ae? br(w) — 457610) 
O° t=0 I=1 
n-1 co 
= di bilj)(dz/?bi(u) — dz"? d(v)) S20)! 
l=1 t=0 


= i lA Gulu) — 49? hr(v)) 


n-1l 
_ Soy ou((det6x(u) = dg 0y(0)) 


Die 
1 
es Tog (G4 1) — Gv, j)). 
We have 
1 
jim Hel (u) — He (v) = Tote —1,)G 


Remark 3.2. Observe that the convergence rate of Het*!(u) — Het*} (v) is determined 


by A* = max{1 — (1 — a)Ay, (1 — a)An_1 — 1). It is critical that we assume a F 0. 


yas 


When a = 0 then X* < 1 holds only if A,_1 < 2, ie. G is a non-bipartite graph (see 


ple 


When \q Aga 2M (as a function of ~) achieves the minimum value Aiea 


An-1+A1 


ata=1- . This is the best mixing rate that the a-lazy random walk on G 


_ 2 
M1+An—1 


can achieve. Using the a-lazy random walks (with a = 1 — ) to approximate 


a 
A1+An-1 


the DSD L,-distance will be faster than using regular random walks. 
Equation (3.6) implies ||Gl|2 < ae Combining with Theorem 3.1, we have 


Corollary 3.3. For any connected simple graph G, and any two vertices u and v, we 
have DSD2(u,v) < 2 4, 


Note that for any connected graph G with diameter m (Lemma 1.9, [15]) 


i 


Ale m vol(G) | 


This implies a uniform bound for the DSD Lz distances on any connected graph G 


on n vertices. 
2N re 
DSD,2(u,v) < am vol(G) < V2n3°. 


This is a very coarse upper bound. But it does raise an interesting question “How 


large can the DSD L,-distance be?” 


3.4. SOME EXAMPLES OF THE DSD DISTANCE 


In this section, we use Green’s function to compute the DSD L,-distance (between 


two vertices of the distance reaching the diameter) for paths, cycles, and hypercubes. 


The path P, 


We label the vertices of P, as 1,2,...,n, in sequential order. Chung and Yau com- 


puted the Green’s function G of the weighed path with no boundary (Theorem 9, 
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[22]). It implies that Green’s function of the path P,, is given by: for any u < v, 


Glu, v) = eal ete a eal sted? 
in 19\ 2% (der 
— DO dite +d) (dai +--+ + in) 
USz<u 
dats fs ca ci 
= Tn -1? (Se —1)?+ So (2n — 22 — 1)? — $0 (22 — 1)(2n — 22 - 0) 
z=]. Z=v zZ=u 
_ M12 S > (22 — 1)? + 9 (2n — 2)(2n — 4z) — $2 (2z — 1)(2n — 2) 
z=1 Z=v z=u 
— Mba (Qn -VQn-3) Vad (7 v-l 
= in —1) tom) ana) =e) 
dy dy 2 2 2n? —4n+3 
Sy ((u— 1? + ew 
When u > v, we have 
d,d 2n? —4n +3 
= UMV = 1 2 A) 2 SS Te 
Gu,0) = 90,2) = he ((w- 1)? +o y- ES) 
Applying G(u, v) = ve (u,v), we get 
Sea atin ((u- 1)? + (n — 0)? — 8) ifu<y; 
wet ((v — 1)? + (n— wu)? — tess) ity > v. 
We have 
An? — 8n+3 
G(1,1) = 1(n—1) 
2 — 
G(1,j) = —— (@-3)?- eee) for2<j<n-l. 
2n? —4n +3 
a) 12(n—1) 
2n? —4n +3 
1 — 7 
Sin 1) 12(n—1) 
2 — 
G(n, j) ;(G-17- = =*3) for2<j<n-1; 
An? — 8n+3 
CRO) = 9G =A) 
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Thus, 


ut ta ee 
G1, 7) —G(n,j) = \n+1-2j) if2<j<n-1; (3.7) 
et ij he 


Theorem 3.4. For any q > 1, the DSD L,-distance of the Path P,, between 1 and n 
satisfies 


DSD,(1,n) = (1+ q) Vint 4 + O(n'/9). 


Proof. 


n-1l\? * Hr 
DED. (a= 2( 5 } + nt+1-25F 
j=2 


1 ‘ 1/q 
= — +q q 
(a + O(n ) 


= (1+¢q) in 2 + O(n"), 


For gq = 1, we have the following exact result: 


DSD,(1,n) = y IG(1, 7) — G(r, 9) 


2k? -2k+1 ifn=2k 


Qk? ifn =2k+1. 


The cycle C,, 


Now we consider Green’s function of cycle C,,. For x,y € {1,2,...,n}, let |x —y|- be 


the graph distance of x,y in C,. We have the following Lemma. 


Lemma 3.5. For even n = 2k, Green’s function G of C,, is given by 


I 
Gag) =— (ko |e—yl) 2H ae 
2k 


For odd n = 2k +1, Green’s function G of C,, is given by 


2 kd =|2=9(, k?+k 
G(e,¥) ( | ) 28 


~ ok+1 2 (2k +1) 
Proof. We only prove the even case here. The odd case is similar and will be left to 


the readers. 
For n = 2k, it suffices to verify that G satisfies Equations (3.4) and (3.5). To 


verify Equation (3.4), we need show 


1: 
1 1 —,ife Ay, 
l-fifg=y. 
Let z= £+ 7 andi= |x —y|,. For #y, we have 
1 
all ; 1,1 2 Bees = 2 
= (kd? -2)- He (k-4- 1)? - 2) - FG - t+ 2) 
ees 
ok 
an sl 
ee 


To verify Equation (3.5), it is enough to verify 


PH?+---+(kK-1P? +h? 4+ (k-1)7? 4+---4+7 


This can be done by induction on k. 


dl 


Theorem 3.6. For any q > 1, the DSD L,-distance of the Cycle C,, between 1 and 


[¥| +1 satisfies 


. A \9 py i+i/a 
DsDy(i, 5) +1) = (+2) (=) + O(n2), 


Proof. We only verify the case of even cycle here. The odd cycle is similar and will 


be omitted. 
For n = 2k, the difference of G(1,7) and G(1 + k,7) have a simple form: 
G(1,j) GO +k j) = sp((h- -P) = 5-1 
J I= ok U U a 9 2, 


where i = |j — 1|,.. Thus, 


DSDy(1.1+K) = (2 


The hypercube (Q),, 


Now we consider the hypercube @,, whose vertices are the binary strings of length 
n and whose edges are pairs of vertices differing only at one coordinate. Chung and 
Yau [22] computed the Green’s function of Q,: for any two vertices x and y with 


distance k in Qn, 


G(a, nara(-E (3) supaes con ares (")) SS (es col a 
Lat ht tOF ait) 
oe a ey 
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We are interested in the DSD distance between a pair of antipodal vertices. Let 0 
denote the all-0-string and 1 denote the all-1-string. For any vertex «, if the distance 


between O and x is 7 then the distance between 1 and x is n — 7. We have 


G(0, z) — G(1,z) =-2" 5S (45) gene 6 oo (if) 


CD sence ("5') 


j<k 
n—k— Hy aoe n 
n-1 
ay ( i 
Here we use the convention that )4_,c¢; = — ey €; 10rb' a. 


Theorem 3.7. For any q > 1, the DSD L,-distance of the hypercube Q,, between O 
and 1 satisfies 
o-n Soe (i) aie: (") 


DSD,(0,1) = (3: (;) = a j a | (3.9) 


In particular, DSD,(0,1) = O(1) when q > 1 while DSD,(0,1) = Q(n). 


Proof. Equation (3.9) follows from the definition of DSD L,-distance and Equation 


(3.8). Let 


Observe that a; = dn_,, we only need to estimate a, for 0 < k < n/2. Also we can 
throw away the terms in the second summation for 7 > n/2 since that part is at most 


half of ay. For k <7 < n/2, 


per(Gha)e+G)) sr 


q 
Thus a, has the same magnitude as by := () (ae chy ) : 


qd 
For gq > 1, we first bound b, by bk < (3 (e5;) = O(n@-9'+9), When k > 
k 
are we have by = O(n~?). The total contribution of those b;’s is O(n), which is 


negligible. Now consider the term b; for k = 0,1,..., ak We bound b; by 


m= (1) (Gay Gp) - 
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This implies DSD,(0,1) = O(1). The lower bound DS'D,(0,1) > 1 is obtained by 
taking the term at k = 0. Putting together, we have DSD,(0,1) = O(1) for g > 1. 


For ¢ = 1, note that 


Thus, DSD,(0,1) = Q(n). 


3.5 RANDOM GRAPHS 


In this section, we will calculate the DSD L,-distance in two random graphs mod- 
els. For random graphs, the non-zero Laplacian eigenvalues of a graph G are often 


concentrated around 1. The following Lemma is useful to the DSD L,-distance. 


Lemma 3.8. Let \1,...,An—1 be all non-zero Laplacian eigenvalues of a graph G. 
Suppose there is a small number € € (0,1/2), so that for 1 <i<n-—1, |1—A,| <e. 


Then for any pairs of vertices u,v, the DSD L,-distance satisfies 


N= ah 
|DSD, (u,v) — 2/4] < —— vtq «f922, (3.10) 
1 1 A A 
|DSD,(u,v) — 244] < ne-# +> forl<q<2. (3.11) 


Proof. Rewrite the normalized Green’s function G as 
G=I- do¢0+ T. 
Note that the eigenvalues of T := G — I + ¢o¢p are 0, y ae eee io — 1. Observe 


that for each i = 1,2,...,n-1,|~—1| < <&. We have 


E 
Alea 


Tl s 
Thus, 
DSD, (u,v) = ||(1u — 1,)D71GD"? ||, 
= |[Qu = 1y) DVT = 96 + T) DY" Ia 


S ||u — 1) D7? (7 — 66)" [Iq + [lu — 1) DP YD" II 9. 
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Viewing Y as the error term, we first calculate the main term. 
I|(Lu — 1,)D-/?(I — ¢)D"" lq 
= ee = 1,)( = Wlla 
= ee = 1, )|lq 
Ss 
The Lo-norm of the error term can be bounded by 


du -1,)D7? TD"? |p 


< ||(1u — 1.)D-™ Hall III_D*/?|| 


1 1 € 

ae ee 
a, 8 A A 
-: re i 


To get the bound of Ly-norm from L2-norm, we apply the following relation of L,- 


norm and L-norm to the error term. For any vector x € R” 


y) 
Illlq < llall2 for q > 2. 


and 


lIzlg< nt? ||2\2 for <q <2. 


The inequalities (3.10) and (3.11) follow from the triangular inequality of the 


L,-norm and the upper bound of the error term. 


Now we consider the classical Erdés-Renyi random graphs G(n, p). For a given n 
and p € (0,1), G(n,p) is a random graph on the vertex set {1,2,...,2} obtained by 
adding each pair (7,7) to the edges of G(n,p) with probability p independently. 

There are plenty of references on the concentration of the eigenvalues of G(n, p) 


(for example, [23], [26],[58], and [60]). Here we list some facts on G(n, p). 


1. For p > “teleen 


, almost surely G(n, p) is connected. 
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2. For p > 1287 G(n, p) is “almost regular”; namely for all vertex v, d, = (1 + 


On(1))np. 


3. For np(1—p) > log“ n, all non-zero Laplacian eigenvalues \,’s satisfy (see [60]) 


3+0,(1 
pyeate een) or 
Jnp 
Apply Lemma 3.8 with « = G2) 


7 ae and note that G(n, p) is almost-regular. We 
get the following theorem. 


Theorem 3.9. For p(1—p) > lee te, almost surely for all pairs of vertices (u,v), the 
DSD L,-distance of G(n, p) satisfies 


1 
DSD,(u,v) =27£0 (=) if q > 2, 
af ) mp fq 


dit 
na 2 


| sg <2. 


DSD,(u, v) = 2/7 +0 


Now we consider the random graphs with given expected degree sequence 


G(wi,...,Wn) (see [8], [18], [17], [16], [48]). It is defined as follows: 


1. Each vertex i (for 1 <7 <n) is associated with a given positive weight w;. 


2. Let p = = ;, For each pair of vertices (i,7), ij is added as an edge with 
i=l? 
probability w;w,p independently. (¢ and 7 may be equal so loops are allowed. 


Assume w;wjp < 1 for i,j.) 


Let Wmin be the minimum weight. There are many references on the concentration 


of the eigenvalues of G(wi,...,Wn) (see [19], [20], [23], [26], [60]). The version used 
here is in [60]. 


1. For each vertex 7, the expected degree of 7 is w,. 


2. Almost surely for all ¢ with w; >> logn, then the degree d; = (1 + o(1))wi. 
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3. If Wmin >> log’ n, all non-zero Laplacian eigenvalues ; (for 1 <i<n-—1), 


1 
cee (3.13) 


Wmin 


Theorem 3.10. Suppose Wmin >> log’ n, almost surely for all pairs of vertices (u,v), 


the DSD L,-distance of G(wy,...,Wy) satisfies 


1 w w 
DSD : =— 91/4 + O Max max . > D 
(a2) (= [e+ ) tee 
are 
q 
DSD,(u,v) = 2/7 £0 (= amet ta pls ge 2: 


3.6 EXAMPLES OF BIOLOGICAL NETWORKS 


In this section, we will examine the distribution of the DSD distances for some bio- 
logical networks. The set of graphs analyzed in this section include three graphs of 
brain data from the Open Connectome Project [73] and two more graphs built from 
the S. cerevisiae PPI network and S. pombe PPI network used in [12]. Figure 1 and 
2 serves as a visual representation of one of the two brain data graphs: the graph of 
a cat and the graph of a Rhesus monkey. The network of the cat brain has 65 nodes 
and 1139 edges while the network of rhesus monkey brain has 242 nodes and 4090 
edges. 

Each node in the Rhesus graph represents a region in the cerebral cortex originally 
analyzed in [46]. Each edge represents axonal connectivity between regions and there 
is no distinction between strong and weak connections in this graph [46]. The Cat 
data-set follows a similar pattern where each node represents a region of the brain 
and each edge represents connections between them. The Cat data-set represents 
18 visual regions, 10 auditory regions, 18 somatomotor regions, and 19 frontolimbic 
regions[65]. 

For each network above, we calculated all-pair DSD Lj-distances. Divide the 


possible values into many small intervals and compute the number of pairs falling 
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Cat Brain Network Rhesus Brain Network 


Figure 3.1 The brain networks: (a), a Cat; (b): a Rhesus Monkey 


into each interval. The results are shown in Figure 3.1. The patterns are quite 


surprising to us. 


DSD Distribution of Rhesus Network 


DSD Distribution of Cat Network 3500 
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Figure 3.2 The distribution of the DSD L,-distances of brain networks: (a), a Cat; 
(b): a Rhesus Monkey 


Both graphs have a small interval consisting of many pairs while other values are 
more or less uniformly distributed. We think, that phenomenon might be caused by 
the clustering of a dense core. The two graphs have many branches sticking out. 


Since we are using L,-distance, it doesn’t matter the directions of these branches 


sticking out when they are embedded into R” using Green’s function. 
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When we change L,-distance to L2-distance, the pattern should be broken. This 


is confirmed in Figure 3.3. The actual distributions are mysterious to us. 


DSD Distribution of Cat Network DSD Distribution of Rhesus Network 
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Figure 3.3 The distribution of the DSD L»-distances of brain networks: (a), a Cat; 
(b): a Rhesus Monkey 
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CHAPTER 4 


NON-UNIFORM HYPERGRAPHS 


For each i € [r], where R = {k,,...,k,}, define f;(H) to be the number of k;-edges 
belonging to E(H). Also, for each p; € [0,1], set a; := —logp;. Let H := H? be an 


R-graph on v vertices. Define 


o(H,p) = |V(A)| — So aifi( A), 
41. 
and then, 
q(H,p) = min, o(H’,p). 


We can now state the main result. 


Theorem 4.1. Let G be the random hypergraph G®(n,p). For any ¢ > 0, for n 


sufficiently large, we have: 
1. If q(H,p) < —e then G®(n, p) is almost surely H-free. 


2. If q/H,p) > ¢, then almost surely G®(n,p) contains (1 + o(1)) pacer 


copies of H. 


Using the same technique as in Theorem 4.1, we can make a claim regarding 
extensions, as well. First define f°(H) denote the number of edges from H of type 
i contained in H|s. Next, define for each subhypergraph H’ C H, ¢s(H’,p) := 
v— |S|— Soil) — a; f?(H')), and qs(H, p) := mingcy bs(H',p). 

Theorem 4.2. Suppose G = G®(n,p), and let a subset S C V(G) be given. Assume 
H|s C Gg almost surely. If qs(H, p) > €, then H|s can almost certainly be extended 


to a copy of H inG. 
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4.1 NOTATION, METHODS, AND EXAMPLES 


It is not immediately clear how to extend the notions of ‘balanced’ or ‘strictly bal- 
anced’ to the non-uniform case. Consider the related extremal poset problem. One 
can ask what conditions are required in order to force the presence of a poset P to 
appear in some set family *. A necessary condition for P to appear is that every 
subposet P’ C P appears in the family F. With this analogy in mind, for |R| > 1, we 
propose a new definition for the hypergraph. Let Vy denote the number of vertices 
of a subhypergraph H’ Cc H. A set of hypergraphs Hj,...,H;, with H = Ut_s A; is 
balanced if for all subgraphs H’ C H and all 1 <j < k, Sy, aifi( Hj) < Vu, implies 
i i fi( A") < Var , where each a; > 0. 

As an example, we consider the hypergraph H pictured in Figure 4.1. The 1-edges 
are represented with solid circles, and the 2-edges are represented in the customary 
way. The set {H,,H2} constitutes the balanced subgraphs of H, which was de- 
termined by plotting the half-planes a; f\(H;) + a2fo(Hi) = Vu, for each subgraph 
HT, C H, as seen in Figure 4.2. Consider the intersection of these half-planes in the 
first quadrant, which is shaded. The lines that compose the boundary of this inter- 
section correspond to the graphs in the balanced set. Lines that did not lie on the 
boundary of the image were omitted here. Now for any ordered pair (aj, @2) lying in 
the shaded region of the plot, consider the probability vector p = (n~“!,n~°?). By 


Theorem 4.1, the random graph G ~ G(n, p) almost surely contains a copy of H. 


A Ay 2 


Figure 4.1 A hypergraph H with R = {1,2}, and its balanced subgraphs 


Now fix any hypergraph H = HF on v vertices with the edge type R = {k,,..., kp}. 


Al 


a Hs 


fat 


1 ais 


Figure 4.2 The probabilities for which G(n, p) almost surely contains H 


The goal is to count the number of copies of H contained in the random hypergraph 


G=G*(n,p). For any S € Ge define an indicator variable 


1 ifGly; DH 
oa 
0 otherwise. 
contained inside G|s5. So set 
X= VD Xs 


Then, 


af? v! (H) 
= (") rmut(ay LL 


where p; = 1/n%. 


The proof depends on an inequality from Kim and Vu. First, set uw = E(X), 


OX 
= MAX aaa; and E = max{, E’}. 
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Theorem 4.3 ([55]). For any \ > 1, if X is polynomial, then 
Prl|X — p| > aV EE") < dye On? 
where a, = 8" Vk! and dy, = 2e?. 


To illustrate our methods more clearly, we must invoke derivatives. Take any 
injection yp : V(H) — [n]. For each F € E(#), define an indicator variable t,:7) to 
be 1 if y(F) is an edge in G, and 0 otherwise. Hence, we can write X, the number of 


copies of H in G, as the polynomial: 


Next, we may define the partial derivatives of X as follows. Suppose H’ Cc H is 
subgraph spanning V(H). Let S = S(H’) denote the set of vertices incident to an 
edge in E(H'). We then restrict our attention to considering only those subgraphs 
H’ such that H[S] C H’ C H. Let y* denote those maps that fix S(H’). Then 


define: 
OX OX 
OH! = a a Se 
Tleceqn) Cte “& 


Tp (F)- 
FEE(H)\B(H’) 
Now we examine an example hypergraph H with V(H) = {1,2,3,4} and E(#) = 


{1, 4, 12, 23, 234}, as shown in Figure 4.3. The shaded triangle represents the 3-edge. 


1 @4 


2 3 


Figure 4.3 A hypergraph with R = {1, 2,3} 


Say y maps (1,2,3,4) +> (4,j,k,0). So X = Yo, tit;jtijtjntjxe. Now consider the 
(vertex spanning) subgraph H’ given by E(H’) = {1, 12,23}. Then S(H’) = {1, 2,3}. 
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Then taking the derivative of H with respect to H’, we obtain: 


OH! At Ot120to3 Da tet jee 


£ 


ax Ox 


= Vo tomcat yr (ayy (3)0*(4) 
y* 


with expected value 


SO Vice ox Braet pSoieBiek 
“\ OH): ~ \ Ob 0s0ig ) = Re 


as we have n choices for y*(4), the only vertex whose destination is not already 


determined from our choice of H’. For another example, we consider the subgraph 
H’ given by E(H") = {23, 234, 4}. Here, we obtain: 


ax aX 


= titi; = Fa dan bok . 
OH' Ot30t2340t4 » J x p* (1) "p* (1) e* (2) 


2 Ui coneae-Vatice = iE < = l-a1—ag 
me) (Fr S Npip2 =n 


4.2 PROOF OF THEOREM 4.1 


Taking p; ~ 1/n%, recall that 


v! n\ ~ ¢ 
RO) ee fi(H) 
(X) Aut(H) (") il? 


By I Pict ee ai fi(H) 


Aut(H) 


= Lotte). 


Aut(H) 


Now if q(H, p) < —e < 0, then there exists a subgraph H’ C H such that 
o(H',p) = V(H") — > oufi(H’) < —€. 
i=l 
Letting X(H’) denote the number of copies of H’ in G. Then, 


E(X(H') ia 


2 


|Aut(H)| 
= o(l). 
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That is, G almost surely does not contain a copy of H’, and thus almost surely does 
not contain H. For the other case, when ¢(H’,p) > € > 0 for all subgraphs H’, we 
require some more machinery. Recalling the definitions of our derivatives, E, and E’, 


observe that: 


Be J ptsntcn 1) < En=. 
v! 


Now for any subgraph H’ Cc H, 


nS Ox Pi 
2 (5) a‘. (5 II 6) 
g* FCE(H)\E(H’) 


< E S Ilrezu) | 
o [reuaytr 
ne yo eed) 
= s no SS a eh) 
we plan) ei 
= no ye le) 
nIVDL no Vin f(A) 
S AVG Sah) 
< nop)-o(H'p) 
< no(p)—a(F'p) 
nop) 
= : 
n 


es o(1))|Aut(H) |= 


€ 


m 


ne 


But since p = nP)|Aut(H)|~', then by Theorem 4.3, 


Thus, £’ < (1 + o(1))|Aut(#)| 


Pr Bs — p| > 8° /ulpjAnt(H)|pn—“r"| <6 
Now taking 4 = vlnn, we obtain: 
Er es — p| > 8°(v!|Aut(H)|)'?an-“?0"(In n)"| < 2e?n7* 


However, for « > 0, note that 8°(v!|Aut(G)|)'/?un-</2v"(Inn)” —+ 0 as n > oo. 


Therefore in this case we expect to find (1 + o(1))u copies of H inside G. 
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4.3. PROOF OF THEOREM 4.2 


Fix any subset S C V(G). We wish to determine a sufficient condition for us to 
extend H|s to a copy of H contained in G for some given hypergraph H. To be more 
precise, we adopt a rigorous definition similar to that for the classic graph case in [66] 
and |67]. Assume H|s C G|s as subgraphs. We say H|y can be extended to a copy 
of H in G if there exists a set Z C V(G) such that for each F C SUZ, if F € E(#) 
then F € E(G). 

Let w : V(G) > V(G) be an injection such that ~|s is the identity map. Let Y 
denote the number of such extensions. As before, let |V(G)| =n and |V(H)| = v. 


Then, 


Y=>) [I tw 


y FeE(H)\E(A|s) 


Now recall f°(H) denote the number of edges from H of type i contained in H|s, 

os(H',p) =v —|8|— ia fi(H’) — aif? (H')) , and qs(H,p) = minwca $s(H',p). 
i=l 

Thus, 


EK(Y) = E (= I <a 
y FCE(H)\E(Al\s) 

_ {n—|S|\ (v—|S})! fi(H 

7 Cane a IT»: 


iE€R 


ae 
|Aut(H|se)| 


So set wy = E(Y). Next we turn our attention to the partial derivatives of Y. In 


light of our earlier definitions of derivatives, the event that we can extend H|, toa 


OY 
copy of H given a subgraph H’ already in place, is given by (Hl U i): So since 
Ss 


K(X) + E(Y), we can approximate 


wen O(H|sUH) Wen OH’ 


So set EY = E’ and Ey = max{py, Ey}. Then by Theorem 4.3, 


Pr ||Y — py| > gl f(y — |S|) uy] Aut (H n° 151] De%e—>ne-lSI-1, 


se) | 
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Now taking \ = (v — |S|) Inn, 


Pr ||Y — py| > 8° Sl(v — |S)? |Aut(H 


ge) uyn (uy — sper] < Dear. 


However, for € > 0, note that 8°-!5I(v —|S])!2/2|Aut(H]|se)|!/24yn-/2u — | S|’! 0 


se) 
as n — oo. Thus, we can almost certainly extend H|, into a copy of H inside G. In 


particular, we expect to find (1 + o(1))y many extensions of H|,. 


4.4 IMPROVING THEOREM 4.1 


In Theorem 4.1, g(H,p) > € > 0 was a sufficient condition for obtaining copies of 
H in G®(n,p). It turns out that we can invoke Chebyshev’s inequality to lower this 


threshold to o(1). Let w(n) denote a function that tends slowly to co as n > ov. 


Theorem 4.4. If q(H,p) > sgt then almost surely G®(n,p) contains (1 + 


0(1)) macy? copies of H. 


Theorem 4.4 follows from the following lemma. 


Lemma 4.5. Jf E(X) = Q(w(n)), then 


Pr(|X — E(X)| > A) = 0(1), 


where \ = max {4E(X), /E(X)w(n)}. 


n 


Proof. Observe that 


AT 


If |S S"| <1, then Xs and Xs are independent, in which case Covar(X 5, Xs’) = 0. 


Hence, 
S> Covar(Xg,Xg7) < > |Covar(Xs, X5)| 
S#S! SAS! 
|SNS"|>2 
= SO |E(XsXg) — E(Xs)E(X5)| 
S#S! 
|SNS"|>2 
< SOE(Xs) SO |E(Xs/|Xg) — E(Xgr)| 
Ss SAS 
|SNS’|>2 
< SOE(Xs) SO E(Xg|X5) + S0E(Xs) SO E(Xsy) 
Ss SAS S SAS 
|SNS’|>2 
x S- EL (Xs) S- ( n~ ae ae) + In ae ae) 
S H! 
1 
= a E (Xs) (net) + a(x) 
r n 
7 1 7 
< E(X)ntP) + = (E(x). 
Putting these together, 
Var(X) = Var bs Xs] (4.1) 
S 
= J > Var (Xs) + 55 Covar(X5, Xs) (4.2) 
S SAS! 
1 
< E(X)(1+nP)) + 5 (E(X))’. (4.3) 
Now Chebyshev’s inequality states that for \ > 0, 
Var(X 
Prix skeet) (4.4) 


Choose \ = max {4 E(X), E(X)w(n)}. Thus » 


to (4.4) implies 
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= o(E(X)). Hence, applying (4.3) 
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